Communication-avoiding Cholesky-QR2 for rectangular matrices
نویسندگان
چکیده
The need for scalable algorithms to solve least squares and eigenvalue problems is becoming increasingly important given the rising complexity of modern machines. We address this concern by presenting a new scalable QR factorization algorithm intended to accelerate these problems for rectangular matrices. Our contribution is a communication-avoiding distributed-memory parallelization of an existing Choleskybased QR factorization algorithm called CholeskyQR2. Our algorithm exploits a tunable processor grid able to interpolate between one and three dimensions, resulting in tradeoffs in the asymptotic costs of synchronization, horizontal bandwidth, flop count, and memory footprint. It improves the communication cost complexity with respect to state-of-the-art parallel QR implementations by Θ(P 1 6 ). Further, we provide implementation details and performance results on Blue Waters supercomputer. We show that the costs attained are asymptotically equivalent to other communication-avoiding QR factorization algorithms and demonstrate that our algorithm is efficient in practice.
منابع مشابه
MATHEMATICAL ENGINEERING TECHNICAL REPORTS CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System
Designing communication-avoiding algorithms is crucial for high performance computing on a largescale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding a...
متن کاملA Multilevel Block Incomplete Cholesky Preconditioner for Solving Rectangular Sparse Matrices from Linear Least Squares Problems
An incomplete factorization method for preconditioning symmetric positive definite matrices is introduced to solve normal equations. The normal equations are formed as a means to solve rectangular matrices from linear least squares problems. The procedure is based on a block incomplete Cholesky factorization and a multilevel recursive strategy with an approximate Schur complement matrix formed ...
متن کاملCommunication Avoiding (CA) and Other Innovative Algorithms
In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it ...
متن کاملOn Evaluating Parallel Sparse Cholesky Factorizations
Though many parallel implementations of sparse Cholesky factorization with the experimental results accompanied have been proposed, it seems hard to evaluate the performance of these factorization methods theoretically because of the irregular structure of sparse matrices. This paper is an attempt to such research. On the basis of the criteria of parallel computation and communication time, we ...
متن کاملLAPACK Cholesky Routines in Rectangular Full Packed Format
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste half the storage space but provide high performance via the use of level 3 BLAS. Packed format arrays fully utilize storage (array space) b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.08471 شماره
صفحات -
تاریخ انتشار 2017